63 research outputs found

    What does validation of cases in electronic record databases mean? The potential contribution of free text

    Get PDF
    Electronic health records are increasingly used for research. The definition of cases or endpoints often relies on the use of coded diagnostic data, using a pre-selected group of codes. Validation of these cases, as ‘true’ cases of the disease, is crucial. There are, however, ambiguities in what is meant by validation in the context of electronic records. Validation usually implies comparison of a definition against a gold standard of diagnosis and the ability to identify false negatives (‘true’ cases which were not detected) as well as false positives (detected cases which did not have the condition). We argue that two separate concepts of validation are often conflated in existing studies. Firstly, whether the GP thought the patient was suffering from a particular condition (which we term confirmation or internal validation) and secondly, whether the patient really had the condition (external validation). Few studies have the ability to detect false negatives who have not received a diagnostic code. Natural language processing is likely to open up the use of free text within the electronic record which will facilitate both the validation of the coded diagnosis and searching for false negatives

    Data quality in European primary care research databases. Report of a workshop held in London September 2013

    Get PDF
    Primary care research databases provide a significant resource for health services and epidemiological research. However since data are recorded primarily for clinical care their suitability for research may vary widely according to the research application or recording practices of individual general practitioners. A methodological approach for characterising data quality is required. We describe a one-day workshop entitled “Towards a common protocol for measuring and monitoring data quality in European primary care research databases”. Researchers, database experts and clinicians were invited to give their perspectives on data quality and to exchange ideas on what data quality metrics should be made available to researchers. We report the main outcomes of this workshop, including a summary of the presentations and discussions and suggested way forward

    A pragmatic approach for measuring data quality in primary care databases

    Get PDF
    There is currently no widely recognised methodology for undertaking data quality assessment in electronic health records used for research. In an attempt to address this, we have developed a protocol for measuring and monitoring data quality in primary care research databases, whereby practice-based data quality measures are tailored to the intended use of the data. Our approach was informed by an in-depth investigation of aspects of data quality in the Clinical Practice Research Datalink Gold database and presentations of the results to data users. Although based on a primary care database, much of our proposed approach would be equally applicable to other health care databases

    Quality of recording of diabetes in the UK: how does the GP’s method of coding clinical data affect incidence estimates? Cross-sectional study using the CPRD database

    Get PDF
    Objective: To assess the effect of coding quality on estimates of the incidence of diabetes in the UK between 1995 and 2014. Design: A cross-sectional analysis examining diabetes coding from 1995 to 2014 and how the choice of codes (diagnosis codes vs codes which suggest diagnosis) and quality of coding affect estimated incidence. Setting: Routine primary care data from 684 practices contributing to the UK Clinical Practice Research Datalink (data contributed from Vision (INPS) practices). Main outcome measure: Incidence rates of diabetes and how they are affected by (1) GP coding and (2) excluding ‘poor’ quality practices with at least 10% incident patients inaccurately coded between 2004 and 2014. Results: Incidence rates and accuracy of coding varied widely between practices and the trends differed according to selected category of code. If diagnosis codes were used, the incidence of type 2 increased sharply until 2004 (when the UK Quality Outcomes Framework was introduced), and then flattened off, until 2009, after which they decreased. If non-diagnosis codes were included, the numbers continued to increase until 2012. Although coding quality improved over time, 15% of the 666 practices that contributed data between 2004 and 2014 were labelled ‘poor’ quality. When these practices were dropped from the analyses, the downward trend in the incidence of type 2 after 2009 became less marked and incidence rates were higher. Conclusions: In contrast to some previous reports, diabetes incidence (based on diagnostic codes) appears not to have increased since 2004 in the UK. Choice of codes can make a significant difference to incidence estimates, as can quality of recording. Codes and data quality should be checked when assessing incidence rates using GP data

    Optimising use of electronic health records to describe the presentation of rheumatoid arthritis in primary care: a strategy for developing code lists

    Get PDF
    Background Research using electronic health records (EHRs) relies heavily on coded clinical data. Due to variation in coding practices, it can be difficult to aggregate the codes for a condition in order to define cases. This paper describes a methodology to develop ‘indicator markers’ found in patients with early rheumatoid arthritis (RA); these are a broader range of codes which may allow a probabilistic case definition to use in cases where no diagnostic code is yet recorded. Methods We examined EHRs of 5,843 patients in the General Practice Research Database, aged ≥30y, with a first coded diagnosis of RA between 2005 and 2008. Lists of indicator markers for RA were developed initially by panels of clinicians drawing up code-lists and then modified based on scrutiny of available data. The prevalence of indicator markers, and their temporal relationship to RA codes, was examined in patients from 3y before to 14d after recorded RA diagnosis. Findings Indicator markers were common throughout EHRs of RA patients, with 83.5% having 2 or more markers. 34% of patients received a disease-specific prescription before RA was coded; 42% had a referral to rheumatology, and 63% had a test for rheumatoid factor. 65% had at least one joint symptom or sign recorded and in 44% this was at least 6-months before recorded RA diagnosis. Conclusion Indicator markers of RA may be valuable for case definition in cases which do not yet have a diagnostic code. The clinical diagnosis of RA is likely to occur some months before it is coded, shown by markers frequently occurring ≥6 months before recorded diagnosis. It is difficult to differentiate delay in diagnosis from delay in recording. Information concealed in free text may be required for the accurate identification of patients and to assess the quality of care in general practice

    Determining the date of diagnosis – is it a simple matter? The impact of different approaches to dating diagnosis on estimates of delayed care for ovarian cancer in UK primary care

    Get PDF
    Background Studies of cancer incidence and early management will increasingly draw on routine electronic patient records. However, data may be incomplete or inaccurate. We developed a generalisable strategy for investigating presenting symptoms and delays in diagnosis using ovarian cancer as an example. Methods The General Practice Research Database was used to investigate the time between first report of symptom and diagnosis of 344 women diagnosed with ovarian cancer between 01/06/2002 and 31/05/2008. Effects of possible inaccuracies in dating of diagnosis on the frequencies and timing of the most commonly reported symptoms were investigated using four increasingly inclusive definitions of first diagnosis/suspicion: 1. "Definite diagnosis" 2. "Ambiguous diagnosis" 3. "First treatment or complication suggesting pre-existing diagnosis", 4 "First relevant test or referral". Results The most commonly coded symptoms before a definite diagnosis of ovarian cancer, were abdominal pain (41%), urogenital problems(25%), abdominal distension (24%), constipation/change in bowel habits (23%) with 70% of cases reporting at least one of these. The median time between first reporting each of these symptoms and diagnosis was 13, 21, 9.5 and 8.5 weeks respectively. 19% had a code for definitions 2 or 3 prior to definite diagnosis and 73% a code for 4. However, the proportion with symptoms and the delays were similar for all four definitions except 4, where the median delay was 8, 8, 3, 10 and 0 weeks respectively. Conclusion Symptoms recorded in the General Practice Research Database are similar to those reported in the literature, although their frequency is lower than in studies based on self-report. Generalisable strategies for exploring the impact of recording practice on date of diagnosis in electronic patient records are recommended, and studies which date diagnoses in GP records need to present sensitivity analyses based on investigation, referral and diagnosis data. Free text information may be essential in obtaining accurate estimates of incidence, and for accurate dating of diagnoses

    How many mailouts? Could attempts to increase the response rate in the Iraq war cohort study be counterproductive?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Low response and reporting errors are major concerns for survey epidemiologists. However, while nonresponse is commonly investigated, the effects of misclassification are often ignored, possibly because they are hard to quantify. We investigate both sources of bias in a recent study of the effects of deployment to the 2003 Iraq war on the health of UK military personnel, and attempt to determine whether improving response rates by multiple mailouts was associated with increased misclassification error and hence increased bias in the results.</p> <p>Methods</p> <p>Data for 17,162 UK military personnel were used to determine factors related to response and inverse probability weights were used to assess nonresponse bias. The percentages of inconsistent and missing answers to health questions from the 10,234 responders were used as measures of misclassification in a simulation of the 'true' relative risks that would have been observed if misclassification had not been present. Simulated and observed relative risks of multiple physical symptoms and post-traumatic stress disorder (PTSD) were compared across response waves (number of contact attempts).</p> <p>Results</p> <p>Age, rank, gender, ethnic group, enlistment type (regular/reservist) and contact address (military or civilian), but not fitness, were significantly related to response. Weighting for nonresponse had little effect on the relative risks. Of the respondents, 88% had responded by wave 2. Missing answers (total 3%) increased significantly (p < 0.001) between waves 1 and 4 from 2.4% to 7.3%, and the percentage with discrepant answers (total 14%) increased from 12.8% to 16.3% (p = 0.007). However, the adjusted relative risks decreased only slightly from 1.24 to 1.22 for multiple physical symptoms and from 1.12 to 1.09 for PTSD, and showed a similar pattern to those simulated.</p> <p>Conclusion</p> <p>Bias due to nonresponse appears to be small in this study, and increasing the response rates had little effect on the results. Although misclassification is difficult to assess, the results suggest that bias due to reporting errors could be greater than bias caused by nonresponse. Resources might be better spent on improving and validating the data, rather than on increasing the response rate.</p
    corecore